DVD: A Model for Event Diversified Versions Discovery
نویسندگان
چکیده
With the development of the techniques of Event Detection and Tracking, it is feasible to gather text information from many sources and structure it into events which are constructed online automatically and updated temporally. There are always diversified versions to describe an event and users usually are eager to know all the versions. With the huge quantity of documents, it is almost impossible for users to read all of them. In this paper, we formally define the problem of event diversified versions discovery. We introduce a novel and principled model (called DVD) for discovering diversified versions for events. Unlike traditional clustering methods, we apply an iterative algorithm on a bipartite graph integrating co-occurrence and semantics to select the popular words and filter them to reduce the tight correlation between documents in a specific event. Hybrid link structures between words are utilized to find the hierarchical relationships. We employ a web communities discovery algorithm to construct virtual-documents which consist of a bag of words indicating one of the diversified versions. Under Rocchio Classification framework, we can classify the documents to diversified versions. With our novel evaluation method, empirical experiments on two real datasets show that DVD is effective and outperforms various related algorithms, including classic K-means and LDA.
منابع مشابه
One Film , or Many ? : The Multiple Texts of the Colonial Korean Film Volunteer
Until recently, studies on films from colonial Korea in the Japanese empire had to rely primarily on secondary texts, such as memoirs, journal and newspaper articles, and film reviews. The recent discovery of original film texts from archives in Japan, China, Russia, and elsewhere and their availability on DVD format, prompted an important turning point in the scholarship. However, juxtaposing ...
متن کاملConsidering Uncertainty in Modeling Historical Knowledge
Simplifying and structuring qualitatively complex knowledge, quantifying it in a certain way to make it reusable and easily accessible are all aspects that are not new to historians. Computer science is currently approaching a solution to some of these problems, or at least making it easier to work with historical data. In this paper, we propose a historical knowledge representation model takin...
متن کاملMining Event Temporal Boundaries from News Corpora through Evolution Phase Discovery
Currently news flood spreads throughout the web. The techniques of Event Detection and Tracking makes it feasible to gather and structure text information into events which are constructed online automatically and updated temporally. Users are usually eager to browse the whole event evolution. With the huge quantity of documents, it is almost impossible for users to read all of them. In this pa...
متن کاملInformation Discovery and the Long Tail of Motion Picture Content
Recent papers have shown that, in contrast to ―the Long Tail‖ theory, movie sales remain concentrated in a small number of hits. These papers have argued that concentrated sales can be explained, in part, by heterogeneity in quality and increasing returns from social effects. Our research analyzes an additional explanation: how incomplete information may skew sales patterns. We use the movie br...
متن کاملDesigning a model for holding mega sport events with an emphasis on national brand development
The present study seeks a model for holding major sporting events with an emphasis on national brand development. The research method is a mixture of qualitative and quantitative. In the quantitative part, the statistical population, including professors and sports activists, and the statistical sample was done by stratified random sampling. Adequate number for modeling in pls software was 300 ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011